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Abstract 

This manuscript presents a linear algebra-based technique that only requires two 
unique photographs from a digital camera to mathematically construct a 3D surface 
representation which can then be 3D printed. Basic computer vision theory and man¬ 
ufacturing principles are also briefly discussed. 


1 Introduction 

1.1 Purpose 

The purpose of this paper is to describe an algorithm based on ideas from linear al¬ 
gebra and geometry to generate a 3D-printed model from a pair of similar photographs 
(see Figure [^. Our work is part of a broader movement to answer the question, “What 
can a mathematician do with a 3D printer?” for which there is some recent work done 
in areas such as fractals [7], knots m, smooth manifolds |10j . polyhedra pin!, a 
print based on Conway’s Game of Life m, and demonstrations of theoretic or his¬ 
torical constructs [8]. The recent availability of relatively inexpensive 3D printers has 
facilitated these projects. 

3D printers create objects through the technology of additive manufacturing - start¬ 
ing with nothing and building up the object layer by layer, using only the material 
needed for the object itself. The most affordable 3D printer models are extrusion- 
based models, where a thin plastic filament is fed into a heating element called an 
extruder which melts the filament and lays down a thin trail of plastic onto the build 
plate. 

The opportunities these printers present for at-home manufacturing naturally lead 
to the problem of printing replicas of an already-existing object. A number of websites 
and apps, such as [9] and m, utilize one approach to this problem, which involves 
taking many photographs of an object at varying angles. A high-resolution, 360°, 3D 
model of the object is produced from these photographs, which can then be printed. 

*EFA: Grand Valley State University; SVK: California State University (Sacramento), MS: Bard College 
at Simon’s Rock. Contact: aboufade@gvsu.edu 


1 




The drawbacks of this approach is that it requires a large number of photographs, often 
over 20, and does not function well on objects with smooth, homogeneous surfaces. 
We present an alternative method to 3D print models of existing objects. It is also 
photography-based, but utilizes only two photographs, has minimal restrictions on the 
nature of the object to be modeled, and is simple enough to be implemented using 
mathematical software packages such as Mathematica. The method produces a bas- 
relief model, which is sufficient for our purposes. 

We will begin with more information about 3D printing in general. Then, we will 
discuss the theoretical basis of creating 3D models from photographs via the recovery 
of depth data. Finally, we will describe specifically how the hand in Figure was 
generated from indicated photographs. 

1.2 About 3D Printing 

Inexpensive 3D printers that are now available typically create objects made of 
polylactide (PLA) or acrylonitrile butadiene styrene (ABS) plastic filament. The quick- 
to-solidify melted plastic is extruded onto a flat surface called the “build plate”. The 
extruder moves in the xy-plane parallel to the build plate. Once the extruder has 
deposited one layer of plastic onto the build plate, the build plate moves down slightly 
along the z-axis and the extruder lays a new horizontal layer on top of the previously 
extruded material. In this way, the printer stacks hundreds of horizontal layers on top 
of each other to manufacture user-designed objects. 

Computers represent 3D objects using a mesh, which is a collection of polygons 
in that share edges, forming one continuous surface. Both the type and number 
of the polygons on the surface can be altered to change the resolution of the object. 
The format of files containing information about meshes varies depending on the type 
of polygon used. A common file format is STereoLithographic (STL), which handles 
triangle meshes. STL files record the vertices and normal vectors of each triangle, and 
can be either binary (computer-readable) or ASCII (human-readable). For examples, 
see P[] and [2]. STL files, and other polygon mesh files, are transformed into instructions 
to the printer on how to print an object (see m), so once an STL representation of an 
object has been created, the object can be printed. STL files can be written using a text 
editor, or generated from a 3D designer program. In addition, Mathematica can create 
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an STL file for any 3D surface it can display. STL files representing a mathematically 
defined surface, our particular area of interest, can be created by defining a function 
z = f{x,y) to represent the height z oi a. surface at points {x,y) in a bounded region 
and then graphing this function and exporting the mesh of its graph as an STL hie. For 
example, altitude data (as a function of longitude and latitude) can be used to create a 
mesh, and STL hie, for a mountain. The fnnction will not necessarily be differentiable, 
but in order to print properly, we want it to be continuous. 

Some recent papers connecting mathematics and 3D printing include ID, m, m, 
[lo], and US]. 


2 Experimental Section: Underlying Theory On 
Extracting Depth Data 

There are two main techniques for hnding the 3D shape of an object from pho¬ 
tographs: photometric and geometric. Photometric methods use the manner in which 
different parts of the object rehect light to derive the orientation of parts of the object’s 
surface, by applying different “shading functions” m- These functions describe how 
light will bend when it hits a specihc material. However, to use the shading functions 
to recover the shape of an object, one must know exactly the type and direction of 
light used in the photo. As this is impractical for most photographs taken outside 
of a photography studio, we instead utilized a geometric approach, which relies upon 
the nature of photography as a projection from IR^ to or from the real world to 
the “image plane” of the flat photo. If one considers more than one photograph, this 
projection produces a number of similar triangles, from which depth can be recovered 
in a process aptly called triangulation. 

2.1 Perspective Projections 

An interesting article on determining camera location from a photograph, which 
is connected to our work, is [6]. As described in detail in that article, the projection 
that occnrs when an object is photographed is not linear, that is, it is not as simple as 
mapping {x,y,z) to (x,y, 0). Instead, in what is called a perspective transformation, 
a given point is projected onto the image plane along the ray connecting the point to 
the camera lens. “Taking a picture” maps a point (x, y, z) in 3-space to its perspective 
projection P by the mapping: 

P : {x,y,z) ^ 0), 

where / is the focal length, or the distance from the camera lens to the image plane. 
(See also [IS].) Under this projection, proportion and angle are preserved only for lines 
and shapes in a plane parallel to the camera (and thus to the image plane). All others 
are warped, with points further from the camera being affected less. 

This mapping provides one way to recover the depth of points: if the original x- 
and y- coordinates of a point in the focal length of the camera, and the point’s 
location in the image plane (which we will call its “pixel coordinates”) are all known, the 
depth z can be found easily. However, the significant external measurements required 
render this technique as impractical as the photometric methods. We will instead 
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Figure 2: AX 1 X 2 X ~ AC 1 C 2 X, therefore z = ^, where 2 ; is depth, / is the focal length, 
Xi is the projection of X by Camera 1 (Ci) and X 2 is the projection of X by Camera 2 (C' 2 ). 
The translation between the cameras is T, while the translation between the projection of 
X is d. Image derived from |5]. 


utilize a convenient relationship that arises in perspective projections from slightly 
different positions, which relates depth to a quantity easily measurable from the photos 
themselves. 

2.2 Depth-Disparity Relationship 

We consider two photographs with parallel image planes, meaning the position of 
the camera in the two photos differs only by a translation T. For our algorithm, the 
translation T must be parallel to the image plane, not perpendicular to it. Figure is 
a simplihed sketch of such a situation. (A more detailed image and discussion can be 
found in [5].) One can see that in such a situation, d, or the distance between a point’s 
projection in one image plane and its projection into the other, depends on how close 
the point is to the camera. In fact, the closer the point is to the camera, the larger d, 
which we will call the “disparity,” will be. So, we observe that: 

1 

z (X - 
d 

where z is the depth of a point. Geometric reasoning with similar triangles or algebraic 
manipulations of the locations of the point on the two image planes conhrms this 
relationship, as well as providing the constant of proportionality. Since 3D modeling 
only requires relative depth of points, and the constant of proportionality involves 
quantities we may not know, the above proportion is sufficient for our purposes. 


3 Results and Discussion: Creating and Print¬ 
ing the Hand 

Given the preliminary ideas above, we will now describe how to extract depth data 
from two photographs. After a general discussion of our algorithm, we will explain 
how it is used with a specihc example. For simplicity’s sake, we will describe how this 
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is done with a contrived scene, using two pictures of a cube with side length of 1 inch 
(see Figure]^. Then, we will show how to apply our method to create and print a 
hand, and suggest directions for future work. 

3.1 The General Idea 

Our method, implemented in Mathematical is a geometric approach that uses two 
photographs of an object, taken from camera positions so that the translation of the 
camera, T in Figure is relatively small. It is based on the inverse relation between 
depth and disparity described in the section above; namely, given two photographs 
with image planes differing only by a small translation, one can recover the depth of a 
3-D point by comparing where it is projected, or its pixel coordinates, in the two image 
planes. 

The key idea is that we only extract coordinates from the digital images for a pre¬ 
defined, relatively small set of points: the corners of approximately planar quadrilateral 
sections of the surface. We then use these coordinates and linear algebra techniques 
to compute the depths of the points within these quadrilateral faces. The calculated 
depths of individual quadrilateral-shaped surfaces are then combined to yield the com¬ 
plete depth data of the surface in question, and to create a depth map of the surface. 
Once this is done, the depth map can be converted easily into an STL mesh, which 
can then be printed. 

3.2 The Transformation Matrix 

For a plane s in one image and its corresponding plane t in the other image (see 
Figure]^ for an example), one can find [3] a transformation matrix Mgt and a constant 
w such that, for any point with pixel coordinates {x,y) G s and pixel coordinates 
{u,v) G t. 


[x y l]Mst = tc[u V l] (1) 

( a d g\ 

b e h \ = w[u V l] (2) 

c / jj 

where w = gx + hy + j. The elements of Mgt can clearly be determined if (x, y) and 

{u,v) are known. For one pair of corresponding pixel coordinates {xi,yi) and {ui,Vi), 
rearranging Eq. [^yields a 2x9 matrix equation with {a ,...,as unknowns: 


a 

b 


Xi yi I 0 0 0 -XiUi -ym 

0 0 Xi yi I -XiVi -yiVi 


d 

e 

f 

9 

h 

J. 


0 

0 


( 3 ) 
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Figure 3: Two example photographs, with corresponding planes s and t highlighted. The 
four pairs of corresponding pixel coordinates are also marked. 


If four distinct pairs of corresponding pixel coordinates {ui,Vi) and {xi,yi) are 
used, four different versions of Eqn. [^are produced and can be combined into one 8x9 
matrix, the nullspace of which yields the coefficients {a,..., j}, up to a multiplicative 
constant. The scaling constant Wi = gxi+hyi+j is determined separately for each point 
ixi,yi) £ s after the elements of Mgt are found. Because the system is homogeneous, 
it is consistent, and the implication of the discussion in [3] is that the rank of the 8x9 
matrix will be 8 as long as the four points are not collinear. So, we will have only one 
free variable. 


3.3 The Cube 

A 3-D model created from two photographs of a cube (see Figure]^ serves to illus¬ 
trate the basic process of the algorithm; models of more complex objects require more 
computation in each step, but do not require additional steps. The cube in the photo¬ 
graph must be partitioned into quadrilateral planar faces with corners distinguishable 
in both photographs, as shown in the figure. The pixel coordinates of the corners in 
one photograph must be recorded and matched with those of the corresponding corner 
in the other photograph. In the current algorithm, the user chooses the quadrilateral 
planar faces and finds the pixel coordinates of the corners manually. The latter can be 
done using any program that displays the pixel coordinates of a given pixel, including 
Microsoft Paint and Mathematica. 

3.3.1 Creating the depth map for one quadrilateral face 

After the quadrilateral faces have been identified and the pixel coordinates of cor¬ 
responding corners have been found, the transformation matrix Mgt for one face of the 
cube can be calculated as described in section 13.21 We will here consider the left-hand 
face of the cube, choosing to find the matrix that maps si to ti, though the inverse 
operation is equivalent in this context. Thus, for any pixel coordinate (x,y) G si, its 
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Figure 4: A visual representation of the depth array of Si, with values in the array interpreted 
as grayscale values (how light or dark a pixel is). Points closest to the camera are closer to 
white, while points further away are closer to black. 


corresponding pixel coordinate {u, v) G ti is given by the equation 

[x y 1] (i- e J) = [u V 1] (4) 

The pixel coordinates (x, y) and (n, v) are the projections of a point p on the cube 
into the image planes of the respective photographs. The true depth D of the point p 
is 


^/{x - uY + {y - vY 

where the denominator of the fraction is the disparity and k is an unknown constant of 
proportionality. Because the 3-D model will be a scaled version of the cube, however, 
relative depth is sufficient, and the value of k can be chosen conveniently. Finding the 
depth of each point on the left-hand face of the cube is simply a matter of calculating 
Equation for each pixel coordinate in si. Note that this depth is measured from the 
camera, so the depths of corners 3 and 4 would be smaller than those of corners 1 and 
2. Given the bottom-up manner in which our model will be printed, it is preferable to 
have the depth of points from the “back” of the cube instead. To do this, we choose k 
so that D <1 for all (x, y) G si, and record 1 — D rather than D. 

Mathematica is capable of interpreting the depth array in a number of ways, two 
of which are shown in Figures and In both, one can see that points outside of si 
have a depth of zero, the line between corners 3 and 4 is closest to the camera, and 
the line between corners 1 and 2 is the furthest away, as expected. The vertical bars 
in which the depth remains the same, reflected either as constant grayscale value or 
constant height, are due to rounding errors. The slight jaggedness of the edges of the 
face, as well as the lines separating different areas of constant depth, is due to the 
discrete nature of the array. The 3-D version of the array, or the depth map, is what 
will eventually be used to create the mesh for our model of the cube. The grayscale 
version is used in the intermediate steps of the process for troubleshooting, as it is 
faster to generate and somewhat easier to manipulate. 
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Figure 5: The 3-D version of the depth array in Figure]^ with values of the array interpreted 
as heights. 


3.3.2 Creating the complete depth map 

Creating the complete depth map for the cube is relatively simple. The process 
described above is duplicated for the right-hand face of the cube, resulting in two depth 
arrays. The constant k must be the same for both arrays, or the relative depths of the 
two faces will be inaccurate. Once the two arrays are computed, they are combined 
to create one large, complete depth array. This step is mostly a matter of matching 
corners, as the upper-right corner of the left face is the same as the upper-right corner 
of the right face. Since the pixel coordinates of this corner in both photographs are 
known, it is possible to determine the position of that corner’s depth values in each 
depth array. When combined, the depth arrays of the two photos must “match up” at 
this position. Given this information and the dimensions of both depth arrays, each 
array is padded with columns or rows of zeros so that their shared corners are in the 
same position in each array, which ensures that the shared edge of the faces is indeed 
shared in the 3D model. The arrays are then simply added element-wise. After the 
depth arrays have been combined, a 3-D version of the large array is generated and 
exported from Mathematica as a mesh, in the STL file format. It is then printed, 
creating the model shown in Figure Note that the 3-D model of the cube is accurate 
to the cube as it appears in the photographs, not as it was originally. Distortions 
occurring due to the perspective projection involved in taking a photograph are not 
corrected. 

3.4 The Hand 

Moving from photographs of a cube to the photographs of the hand, the first al¬ 
teration in the algorithm is imposed by the fact that hands, unlike cubes, are not 
comprised of planar faces with distinct edges and corners. As hands are relatively 
smooth and featureless, it was necessary to alter the hand during the process of tak¬ 
ing the photograph in order to create distinct corners for the quadrilateral faces. As 
shown in Figure a shadow grid was projected onto the hand by shining a common 
1” diameter LED flashlight onto a grid printed on a transparency. The matching pairs 
of pixel coordinates were then found by the user, as with the cube. 
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Figure 6: Left: The complete depth map for the cube, which is used to create the model. 
Right: A 3D printed version of the depth map (a). 


This additional step has the advantage of clearly delineating quadrilateral sections 
of the hand. However, the quadrilateral regions are not necessarily planar; in assum¬ 
ing that they are, as the rest of the algorithm demands, one necessarily produces an 
approximation of the hand as it appears in the photograph. Any curvature within the 
quadrilaterals will not be reflected in the model, as can be seen with the knuckle of the 
index finger. This can be mitigated by using a grid with more squares per inch, which 
has the effect of increasing the accuracy of the approximation. Additionally, since it 
only takes 3 points to determine a plane and the transformation matrix requires 4 
coplanar points, a smaller grid decreases the likelihood that one of the corners of the 
quadrilaterals is not coplanar with the others. 

3.4.1 Creating the complete depth map 

The process of combining the individual depth maps into the complete hand depth 
map is more involved than that for the cube simply because there are more quadrilateral 
faces to consider. The method used to join the depth maps making up the cube can be 
used for the depth arrays of any two adjacent faces; if two depth arrays are combined, 
the resulting array can then be combined with another depth array, providing the two 
faces have an edge in common. Thus, the process of creating the hand depth map is 
that used to create the cube depth map, but applied iteratively. The requirement that 
depth arrays must share an edge to be combined imposes an order on the combination 
process. The “path” of combination can be any path that results in all of the arrays 
being combined; if multiple such paths are available, they yield identical results. 

3.4.2 Correcting the depth map 

The larger number of faces introduces some error in the depth map, which must 
be corrected before printing the model. The boundaries of the quadrilateral faces are 
determined by lines, and the arrays, which are discrete, can only approximate these 
lines. For this reason, some pixels on the boundaries may be interpreted as being in 
multiple quadrilateral faces, and their depth values are stored in multiple arrays. When 
these redundancies occur, the arrays overlap slightly when they are added, resulting 


9 



































in pixels with comparatively large values. In the depth map, this produces large, very 
thin spikes, which interfere with the printing process. This is mitigated by replacing 
these depth values by the average of nearby values, or by manually altering the depth 
array. 


4 Conclusions and Future Directions 

The main limitation of the algorithm is the level of human involvement necessary 
in the initial steps. The user must identify and input the pixel coordinates of match¬ 
ing corners, which becomes prohibitive in models with large numbers of quadrilateral 
faces. This in turn limits the accuracy of the depth map produced because we are 
approximating curved objects with piecewise-defined planes. As a large number of 
small quadrilateral faces yields a better approximation of the original object, the most 
significant area of future research is in automatically detecting corresponding pixel 
coordinates. 

One possible method is suggested by the projection of the shadow grid onto the 
hand. The edges and corners of the quadrilaterals tiling the hand are characterized 
by particularly dark pixels in both photographs. If a pattern of dots were projected 
rather than a grid, only the corners of the quadrilaterals would be marked by pixels 
with a distinct intensity. It would then be possible to recover the locations of the 
pixels with that intensity using Mathematica, though steps would have to be taken 
to ensure the dots were distinct enough from the object to be detected. This would 
find the locations of the corners in each photograph, but would not necessarily match 
them. We developed a rudimentary version of this method that used the order of the 
corner’s detection to identify corresponding corners, but this is clearly unreliable, as 
small rotations of the camera and change in camera angle will change the order in 
which the corners are detected. An effective matching scheme would be necessary to 
make this method feasible. Additionally, some way to detect the edges of the object, 
and to locate the corners of quadrilaterals with one or more edges on the boundary, 
would be necessary, as the projected dots will not necessarily indicate the edge of the 
object accurately. 

Other areas of future development include making full 3-D models, rather than 
the bas-relief type models shown here, which would involve combining the depth maps 
produced by photos from multiple orientations. A related area of research is correcting 
the distortions from the photograph currently preserved in our models, likely using in¬ 
formation presented in [6] about length and angle distortion in perspective projections. 
It is possible that using photographs from multiple orientations will provide enough in¬ 
formation to correct the distortions; if not, correcting them will be a necessary element 
of creating full 3-D models. 
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